"Akio Ohtori - RIP Oppo" (akioohtori)
11/08/2020 at 17:09 • Filed to: None | 6 | 31 |
Hello all. Looks like our fire drill this time last year wasn’t wasted, but unfortunately the ever-changing nature of Kinja means most of the tools we used back them are no long working.
At present, I am not aware of a working automated backup tool. I will update this list as people comment more.
Here is what I am tracking:
Just Jeepin’ Method - WORKING + URL List! [HTML]
!!! UNKNOWN CONTENT TYPE !!!
Just Jeepin’ updated the code and this is working again! Run as per the tutorial linked above and you’ll get every post separated by year and month. If you use the —images flag, full size JPEGs will be saved as well.
If you run the scrpit using —urls-only it will output a list of all your post URLs for use with other methods.
Limitations: Output is in HTML with no formatting. Images embedded in post are full size. Excellent archival tool overall.
!!! UNKNOWN HEADER TYPE (MULTI-LINE BREAK?) !!!
Save Page WE (TheRealBicycleBuck) - Working! [HTML]
!!! UNKNOWN CONTENT TYPE !!!
Good news is this method requires little work and no python/ coding. Follow the directions above and you should be good to go. If you’re up to doing a little bit of Python, the Just Jeepin’ method (above) now has an option to output a URL list for use with this method.
Limitations: On Chrome, the extension names ~90% of the files the same thing. On Firefox I do not have this issue. Additionally, this will not save posts with special characters (example: colon) in the titles.
!!! UNKNOWN HEADER TYPE (MULTI-LINE BREAK?) !!!
Gotham Grabber - Working!? [PDF]
!!! UNKNOWN CONTENT TYPE !!!
Initially I thought it was a problem with the code, but apparently it was actually my environment. Running the command sudo sysctl -w kernel.unprivileged_userns_clone=1 fixed my sandbox issues and this method is also working now... sort of. A lot of people have reported executing it without issue, so the problem may be in my VM.
Limitations: All or nothing. No way to select certain posts.
!!! UNSUPPORTED LINE BREAK IN HEADER !!!
extractKinja [HTML]
A newcomer from Jb boin! I haven’t tried it yet, but seems promising.
I made a tool to extract articles from Kinja blogs and only keep the content part of the article (no header/footer/comments/”You might also like”) whil e saving/replacing any external content that would be fetched from Kinja.
!!! UNKNOWN CONTENT TYPE !!!
!!! UNKNOWN HEADER TYPE (MULTI-LINE BREAK?) !!!
HTTrack - Text Only, HTML format
Promising, but not presently working. Does accept URL text file, which we can generate using Gotham Grabber code.
I’m hopeful on this one, but can’t get it to download images for some reason. Also something about Kinja makes it want to spider out and download all of the internet. Additionally, it only outputs into HTML, which is not my preference.
Linux htmldoc
Unclear what is going wrong here, but the tool just hangs when pointed to a Kinja URL.
!!! UNKNOWN CONTENT TYPE !!!
Halp?
I can run and write code, but websites and such are 100% out of my ability. Don’t know how they work past basic HTML rom the 1990s.
If anyone knows of a working or semi-working method, I’m all ears and happy to test/attempt to improve/run stuff and will keep this updated.
Just Jeepin'
> Akio Ohtori - RIP Oppo
11/08/2020 at 15:06 | 2 |
Thanks for letting me know. I’ll take a look at my tool.
jminer
> Akio Ohtori - RIP Oppo
11/08/2020 at 15:06 | 1 |
I started building something in python last night but my dev environment wasn’t cooperating. I’ve got to run to my storage unit and grab the laptop and Pi I used recently for dev and see if I can make it work.
Tohru
> Just Jeepin'
11/08/2020 at 15:10 | 13 |
Just don’t do it in public or you’ll have the authorities called.
Rusty Vandura - www.tinyurl.com/keepoppo
> Just Jeepin'
11/08/2020 at 15:18 | 3 |
NSFW and all that
Rusty Vandura - www.tinyurl.com/keepoppo
> jminer
11/08/2020 at 15:18 | 5 |
I love it when you talk dirty.
jminer
> Rusty Vandura - www.tinyurl.com/keepoppo
11/08/2020 at 15:20 | 0 |
I had all that tech sitting over in a corner of my hotel room for a week and didn’t touch any of it so I put it all away last weekend. Now I need it :)
TheRealBicycleBuck
> Just Jeepin'
11/08/2020 at 15:23 | 2 |
If you can modify it to just grab the page URLs, that would be great! I just posted an alternate method for Save Page WE that will download everything from a list of URLs. You could save a few steps for that method by generating a ready-made list of URLs.
TheRealBicycleBuck
> Akio Ohtori - RIP Oppo
11/08/2020 at 15:25 | 0 |
I ran into problems with Just Jeepin’s tool, so I cobbled together an alternate procedure.
https://oppositelock.kinja.com/an-alternative-method-for-saving-your-posts-1845613732
barnie
> Akio Ohtori - RIP Oppo
11/08/2020 at 15:58 | 1 |
My sincerest appreciation to all you guys trying to save our stuff and make a new home ! Lots of good adventures, critters and conversations are recorded here (edit: massive understatement) .
If I can help, let me know. The extent of my thoughts aren’t much more than a cur l|grep|sed script but the innertubes are so much more complicated than in my day. I know c and php/etc. and a dozen other langs but w on’t do much OO. Again, thank you.
Just Jeepin'
> Akio Ohtori - RIP Oppo
11/08/2020 at 16:28 | 0 |
Ok, my software has been fixed. I’d like to tweak it, and I’m also going to add a “--urls-only” argument, but it should work as intended.
Just Jeepin'
> TheRealBicycleBuck
11/08/2020 at 16:38 | 1 |
Ok, latest code push has that feature.
Dead_Elvis, Inc.
> Just Jeepin'
11/08/2020 at 16:38 | 1 |
Just Jeepin'
> jminer
11/08/2020 at 16:53 | 0 |
My python script now works again. https://oppositelock.kinja.com/python-archiving-be-fixed-1845614231
jminer
> Just Jeepin'
11/08/2020 at 16:54 | 0 |
Nice! You're awesome
Just Jeepin'
> jminer
11/08/2020 at 16:55 | 0 |
Well, I should have tested it a week ago when this started hitting the fan. C’est la guerre.
Akio Ohtori - RIP Oppo
> Just Jeepin'
11/08/2020 at 17:11 | 0 |
Thank you so much! Post updated.
Based on the changes, you made, is there any chance to update Gotham Grabber in the same way? I’ll look at the diff and see what I can see...
Just Jeepin'
> Akio Ohtori - RIP Oppo
11/08/2020 at 17:19 | 0 |
T he HTML changed subtly. Specifically, there was a div with class “ ‘js_expandable-container” that used to reliably exist, even if it was often empty, but now no longer does.
Akio Ohtori - RIP Oppo
> Just Jeepin'
11/08/2020 at 17:23 | 0 |
Hmm yeah digging into the errors it looks like Gotham Grabber is failing in multiple ways. I think it might be a puppeteer version issue?
Just Jeepin'
> Akio Ohtori - RIP Oppo
11/08/2020 at 17:29 | 0 |
I’ll take a look; I haven’t downloaded the code previously, so this is new to me.
facw
> Akio Ohtori - RIP Oppo
11/08/2020 at 17:30 | 1 |
Note that Gotham Grabber worked mostly fine for me. A few pages gave errors, but most became PDFs. For archiving “most” probably isn’t good enough, but it did seem to generally be functional.
Just Jeepin'
> Akio Ohtori - RIP Oppo
11/08/2020 at 17:37 | 0 |
So, I don’t know what I did differently but it works for me . I didn’t have node at all, so I ran:
brew install node
after failing once: npm install puppeteer
Kar Wai Wong
> Akio Ohtori - RIP Oppo
11/08/2020 at 17:42 | 0 |
I’m in contact with a redditor to hopefully get some script thingy ready for a full entire Kinja archive download , I’ll probably make an update post soon.
Merkin Muffley
> Akio Ohtori - RIP Oppo
11/08/2020 at 18:00 | 0 |
You guys rock.
Jordan and the Slowrunner, Boomer Intensifies
> Akio Ohtori - RIP Oppo
11/08/2020 at 18:07 | 0 |
So does the Save Page WE actually work for dead sites? It’s just odd since the links work on the downloads. I am very tech unsavvy, so sorry for the newb question.
davesaddiction @ opposite-lock.com
> Akio Ohtori - RIP Oppo
11/08/2020 at 18:39 | 0 |
Nice - can you edit to make clear which ones capture comments as well?
Akio Ohtori - RIP Oppo
> davesaddiction @ opposite-lock.com
11/08/2020 at 19:15 | 0 |
I’ll take a look tomorrow. As far as I am aware none of them capture comments (or all comments, anyway)
TheRealBicycleBuck
> Just Jeepin'
11/08/2020 at 19:19 | 0 |
Excellent!
I’ll have to download it and give it a shot.
davesaddiction @ opposite-lock.com
> Akio Ohtori - RIP Oppo
11/08/2020 at 19:47 | 0 |
I thought one of the plug-ins did (if you expanded all comments prior). Still holding out hope for a comprehensive, searchable archive solution.
TheRealBicycleBuck
> davesaddiction @ opposite-lock.com
11/08/2020 at 21:02 | 1 |
C omments on the post or your discussion comments?
For the former, the Save Page WE method will grab the first few that are loaded by Kinja when you jump to the page.
I f you want to grab all of your discussion comments, you might be able to do it with the Save Page WE method, just using the Discussions URLs as the input ( https://kinja.com/USERNAME/discussions ? startTime=....) but I haven’t had the time to decipher the iterations for the start time.
davesaddiction @ opposite-lock.com
> TheRealBicycleBuck
11/08/2020 at 23:08 | 1 |
The former - thanks.
Jb boin
> Just Jeepin'
11/08/2020 at 23:59 | 0 |
Did a basic POC to have static CSS that allows to keep most of the formatting while only extracting content of js_starterpost : http://jbboin.phpnet.org/oppo/extractor/extractKinja.txt
Example
.